System and method for workload-aware request distribution in cluster-based network servers

ABSTRACT

A method and system for workload-aware request in cluster-based network servers. The present invention provides a web server cluster having a plurality of nodes wherein each node comprises a distributor component, a dispatcher component and a server component. In another embodiment, the present provides a method for managing request distribution to a set of files stored on a web server cluster. A request for a file is received at a first node of a plurality of nodes, each node comprising a distributor component, a dispatcher component and a server component. If the request is for a core file, the request is processed at the first node (e.g., processed locally). If the request is for a partitioned file, it is determined whether the request is assigned to be processed locally at the first node or at another node (e.g., processed remotely). If the request is for neither a core file nor a partitioned file, the request is processed at the first node. In one embodiment, the present invention provides a method for identifying a set of frequently accessed files on a server cluster comprising a number of nodes. Embodiments of the present invention operate to maximize the number of requests served from the total cluster memory of a web server cluster and to minimize the forwarding overhead and disk access overhead by identifying the subset of core files to be processed at any node and by identifying the subset of partitioned files to be processed by different nodes in the cluster.

FIELD OF INVENTION

The present invention relates to the field of web servers. Specifically,the present invention relates to a method for workload-aware requestdistribution in cluster-based network servers.

BACKGROUND OF THE INVENTION

Web server clusters are a popular hardware platform in a web hostinginfrastructure. Servers based on clusters of workstations are used tomeet the growing traffic demands imposed by the World Wide Web. Acluster of servers, arranged to act as a single unit, provides anincremental scalability as it has the ability to grow gradually withdemand. However, for clusters to be able to achieve the scalableperformance with the cluster size increase, mechanisms and policies areemployed for “balanced” request distribution.

Traditional load balancing solutions are represented by two majorgroups: 1) Domain Name System (DNS) based approaches; and 2) InternetProtocol (IP)/Transmission Control Protocol (TCP)/Hypertext TransferProtocol (HTTP) redirection based approaches.

In a DNS based approach, the DNS server returns the IP address list(e.g., a list of nodes in a cluster which can serve this content,placing a different address first in the list for each successiverequest) to distribute the requests among the nodes in the cluster.Thus, different clients are mapped to different server nodes in thecluster. DNS based approaches are widely used, as they require minimalsetup time and provide reasonable load balancing. Further, it uses theexisting DNS infrastructure (e.g., there is no additional cost).However, DNS based approaches do not recognize either the load of thenodes in a cluster or the content of the request.

The second group, IP/TCP/HTTP redirection based approaches, employ aspecialized front-end node, the load-balancer, which acts as a singlepoint of contact for the clients and distributes the requests amongback-end server nodes in the cluster. These solutions can be classifiedin the following groups:

-   -   layer four switching with layer two packet forwarding (L4/2);    -   layer four switching with layer three packet forwarding (L4/3);    -   layer seven switching (L7) or content aware switching.

These terms refer to the techniques by which the systems in the clusterare configured together. In a L4/2 and L4/3 cluster, the load-balancerdetermines the least loaded server (this decision is the job of theproprietary algorithms implemented in different products) to whichserver in a cluster the packet has to be sent.

Traditional load balancing solutions for a web server cluster (L4/2 andL4/3) try to distribute the requests among all the back-end machinesbased on some load information.

The load-balancer can be either a switch or a load-balancing server(e.g., hardware solution) or a software load balancer (e.g., softwaresolution). In both solutions, the load-balancer determines the leastloaded server in a cluster to which the packet should be sent.

Load-balancing servers operate by intelligently distributing theincoming requests across multiple web servers. They determine where tosend an incoming request, taking into account the processing capacity ofattached servers, monitoring the responses in real time and shifting theload onto servers that can best handle the traffic. Load-balancingservers are typically positioned between a router (connected to theInternet) and a local area network (LAN) switch which fans traffic tothe Web servers.

FIG. 1A illustrates a block diagram of a typical configuration of anetwork with a load-balancing server in accordance with the prior art.Client 110 issues a request which is received at load-balancing server120, located at the front end. Load-balancing server 120 determineswhich back-end web server (e.g., web servers 130 a and 130 b) gets therequest. The decision is based on a number of factors including: thenumber of servers available, the resources (CPU speed and memory) ofeach, and how many active TCP sessions are being serviced. All trafficis routed through load-balancing server 120.

FIG. 1B illustrates a block diagram of a typical configuration of anetwork with a software load-balancer in accordance with the prior art.Client 160 issues a request which is received at server 170 located atthe front end, wherein server 170 has stored upon it load-balancingsoftware. The load-balancing software determines which back-end webserver (e.g., web servers 180 a and 180 b) gets the request. Thedecision is based on a number of factors including the number of serversavailable, the resources (CPU speed and memory) of each, and how manyactive TCP sessions are being serviced. Once a connection has beenestablished with a particular web server, the web server (e.g., webservers 180 a and 180 b) responds directly to client 160.

Traditional load balancing solutions for a web server try to distributethe requests evenly among all the back-end machines based on some loadinformation. This adversely affects efficient memory usage because thecontent is redundantly replicated across the caches of all the webservers, thus resulting in a significant decrease in overall systemperformance.

Content-aware request distribution (e.g., L7 switching) takes intoaccount the content (can be a Uniform Resource Locator (URL) name, URLtype, or cookies) when making a decision to which back-end server therequest has to be routed. Content-aware request distribution mechanismsenable intelligent routing inside the cluster to support additionalquality of service requirements for different types of content and toimprove overall cluster performance. Policies distributing the requestsbased on cache affinity lead to significant performance improvementscompared to the strategies taking into account only load information.

There are three main components comprising a cluster configuration withcontent aware request distribution strategy: the dispatcher whichimplements the request distribution strategy, it decides which webserver will be processing a given request; the distributor whichinterfaces the client and implements the mechanism that distributes theclient requests to a specific web server; and the web server whichprocesses HTTP requests.

In the content-aware request distribution approach, the cluster nodesare partitioned in two sets: front end and back ends. The front end actsas a smart router or a switch, its functionality is similar to theaforementioned load-balancing software servers. The front end nodeimplements the policy which routes the incoming requests to anappropriate node (e.g., web server) in the cluster. Content-awarerequest distribution can take into account both document locality andcurrent load. In this configuration, the typical bottleneck is due tofront-end node that combines the functions of distributor anddispatcher.

To be able to distribute the requests on a base of requested content,the distributor component should implement either a form of TCP handoffor the splicing mechanism. Splicing is an optimization of the front-endrelaying approach, with the traffic flow represented in FIG. 1A. The TCPhandoff mechanism was introduced to enable the forwarding of back-endresponses directly to the clients without passing through the front-end,with traffic flow represented in FIG. 1B. This difference in theresponse flow route allows substantially higher scalability of the TCPhandoff mechanism than TCP splicing. In considering different clusterdesigns for content aware balancing strategies, it is assumed that adistributor component implements some form of TCP handoff mechanism.

FIG. 2A shows a typical cluster configuration 200 with content-awarerequest distribution strategy and a single front-end 210. In thisconfiguration, the typical bottleneck is due to the front-end node 210that combines the functions of a distributor 220 and a dispatcher 230.Back-end 240 comprises servers 245 a, 245 b, and 245 c.

Thus, another recent solution is shown in FIG. 2B. It is based onalternative distributed cluster design 250 where the distributorcomponents 260 a, 260 b, and 260 c are co-located with the servercomponents 270 a, 270 b, and 270 c, while the dispatcher component 280is centralized.

In this architecture the distributor is decoupled from the requestdistribution strategy defined by the centralized dispatcher module. Theswitch in front of the cluster can be a simple LAN switch or L4 levelload-balancer. For simplicity, we assume that the clients directlycontact distributor, for instance via RR-DNS. In this case, the typicalclient request is processed in the following way. 1) Client web browseruses TCP/IP protocol to connect to the chosen distributor; 2) thedistributor component accepts the connection and parses the request; 3)the distributor contacts the dispatcher for the assignment of therequest to a server; 4) the distributor hands off the connection usingTCP handoff protocol to the server chosen by the dispatcher (since inthis design the centralized dispatcher is the most likely bottleneck,the dispatcher module resides on a separate node in a typicalconfiguration, as shown in FIG. 2 b); 5) the server takes over theconnection using the TCP hand-off protocol; 6) the server application atthe server node accepts the created connection; and 7) the server sendsthe response directly to the client.

This design shows good scalability properties when distributing requestswith the earlier proposed LARD policy. The main idea behind LARD is tologically partition the documents among the cluster nodes, aiming tooptimize the usage of the overall cluster RAM. Thus, the requests to thesame document will be served by the same cluster node that will mostlikely have the file in RAM. Clearly, the proposed distributedarchitecture eliminates the front-end distributor bottleneck, andimproves cluster scalability and performance.

However, under the described policy in a sixteen-node cluster, each nodestatistically will serve only 1/16 of the incoming requests locally andwill forward 15/16 of the requests to the other nodes using the TCPhandoff mechanism. TCP handoff is an expensive operation. Besides, thecost of the TCP handoff mechanism can vary depending on theimplementation and specifics of the underlying hardware. It could leadto significant forwarding overhead, decreasing the potential performancebenefits of the proposed solution.

Web server performance greatly depends on efficient RAM usage. A webserver operates much faster when it accesses files from a cache in theRAM. Additionally, the web servers throughput is much higher too.

Accordingly, a need exists for a request distribution strategy thatmaximizes the number of requests served from the total cluster memory bypartitioning files to be served by different servers. A need also existsfor a request distribution strategy that minimizes the forwarding andthe disk access overhead. Furthermore, a need also exists for a requestdistribution strategy that accomplishes the above needs and thatimproves web server cluster throughput.

SUMMARY OF THE INVENTION

The present invention provides a content-aware request distributionstrategy that maximizes the number of requests served from the totalcluster memory by logically partitioning files to be served by differentservers. The present invention also provides a request distributionstrategy that minimizes the forwarding and the disk access overhead byassigning a small set of most frequent files (referred to as the corefiles) to be served by any node in the cluster.

A method and system for workload-aware request distribution incluster-based network servers are described. The present inventionprovides a web server cluster having a plurality of nodes wherein eachnode comprises a distributor component, a dispatcher component and aserver component. The distributor component operates to distribute arequest to a specific node. The dispatcher component has stored upon itrouting information for the plurality of nodes which is replicatedacross the plurality of nodes. The routing information indicates whichnode is assigned for processing a request. The server component operatesto process the request. In one embodiment, the plurality of nodes arecoupled to a network.

In another embodiment, the present invention provides a method formanaging request distribution of a set of files stored on a web servercluster. A request for a file is received at a first node of a pluralityof nodes, each node comprising a distributor component, a dispatchercomponent and a server component. If the request is for a core file, therequest is processed at the first node. If the request is for apartitioned file, it is determined whether the request is assigned to beprocessed by the first node (e.g., processed locally). If the requestfor a partitioned files is assigned to be processed by the first node,the request is processed at the first node. If the request for apartitioned file is assigned to be processed by another node, therequest is forwarded to the correct node for processing (e.g., processedremotely). If the request is not for a core file or a partitioned file,the request is processed at the first node.

In one embodiment, the web server cluster also comprises a set of basefiles, wherein the base files are a set of frequently accessed filesfitting into a cluster memory (RAM) of the web server cluster.

In one embodiment, the present invention provides a method foridentifying a set of frequently accessed files on a server clustercomprising a number of nodes. A set of base files is defined wherein thebase files are a set of frequently accessed files fitting into thecluster memory of the server cluster. The base files are ordered bydecreasing frequency of access. The base files are logically partitionedinto a subset of core files having a core size, a subset of partitionedfiles having a partitioned size, and a subset of on disk files which areevicted from the cluster memory (RAM) to a disk. Each subset of files isordered by decreasing frequency of access, respectively. The core filesand partitioned files are identified wherein the total of thepartitioned size added to the product of the number of nodes multipliedby the core size is less than or equal to the cluster memory (RAM). Thetotal overhead due to the base files is minimized wherein the totaloverhead equals the overhead of the core files plus the overhead of thepartitioned files plus the overhead of the on disk files.

These and other objects and advantages of the present invention willbecome obvious to those of ordinary skill in the art after having readthe following detailed description of the preferred embodiments whichare illustrated in the various drawing figures.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and form a part ofthis specification, illustrate embodiments of the invention and,together with the description, serve to explain the principles of theinvention:

FIG. 1A illustrates a block diagram of a typical configuration of anetwork with a load-balancing server in accordance with the prior art.

FIG. 1B illustrates a block diagram of a typical configuration of anetwork with a software load-balancer in accordance with the prior art.

FIG. 2A illustrates a block diagram of a typical cluster configurationwith content-aware request distribution strategy with a single front-endin accordance with the prior art.

FIG. 2B illustrates a block diagram of a typical cluster configurationwith content-aware request distribution strategy with co-locateddistributor and server, and a centralized dispatcher in accordance withthe prior art.

FIG. 3 illustrates a block diagram of a scalable web clusterconfiguration with workload-aware request distribution strategy withco-located dispatcher, distributor and server, in accordance with anembodiment of the present invention.

FIG. 4 illustrates a block diagram of the logically partitioned memoryunit of a web server cluster implementing a workload-aware requestdistribution strategy in accordance with an embodiment of the presentinvention.

FIG. 5 is a flowchart diagram illustrating steps in a process ofworkload-aware request distribution in cluster-based network servers inaccordance with an embodiment of the present invention.

FIG. 6 is a flowchart diagram illustrating steps in a process ofidentifying a set of core files that minimizes the overhead due to thebase files in accordance with an embodiment of the present invention.

FIG. 7 is a flowchart diagram of an overall workload-aware requestdistribution strategy for use in a web server cluster in accordance withan embodiment of the present invention.

DETAILED DESCRIPTION

In the following detailed description, for purposes of explanation,numerous specific details are set forth in order to provide a thoroughunderstanding of the present invention. However, it will be apparent toone skilled in the art that the present invention may be practicedwithout these specific details. In other instances, well-knownstructures and devices are not described in detail in order to avoidobscuring aspects of the present invention.

Some portions of the detailed descriptions which follow are presented interms of procedures, steps, logic blocks, processing, and other symbolicrepresentations of operations on data bits within a computer memory.These descriptions and representations are the means used by thoseskilled in the data processing arts to most effectively convey thesubstance of their work to others skilled in the art. A procedure,computer executed step, logic block, process, etc., is here andgenerally conceived to be a self-consistent sequence of steps ofinstructions leading to a desired result. The steps are those requiringphysical manipulations of data representing physical quantities toachieve tangible and useful results. It has proven convenient at times,principally for reasons of common usage, to refer to these signals asbits, values, elements, symbols, characters, terms, numbers or the like.

It should be borne in mind, however, that all of these and similar termsare to be associated with the appropriate physical quantities and aremerely convenient labels applied to these quantities. Unlessspecifically stated otherwise as apparent from the followingdiscussions, it is appreciated that throughout the present invention,discussions utilizing terms such as “accessing”, “determining”,“storing”, “receiving”, “requesting” or the like, refer to the actionsand processes of a computer system, or similar electronic computingdevice. The computer system or similar electronic device manipulates andtransforms data represented as electronic quantities within the computersystem's registers and memories into other data similarly represented asphysical quantities within the computer system memories into other datasimilarly represented as physical quantities within the computer systemmemories or registers or other such information storage, transmission,or display devices.

Portions of the present invention are comprised of computer-readable andcomputer executable instructions which reside, for example, incomputer-usable media of a computer system. It is appreciated that thepresent invention can operate within a number of different computersystems including general purpose computer systems, embedded computersystems, and stand alone computer systems specially adapted forcontrolling automatic test equipment.

FIG. 3 illustrates a block diagram of a scalable web clusterconfiguration 300 with workload-aware request distribution strategy inaccordance with one embodiment of the present invention. In thisconfiguration, the distributor components (e.g. distributors 320 a, 320b and 320 c) and dispatcher components (e.g. dispatcher 330 a, 330 b and330 c) are co-located with the server components (e.g. server 340 a, 340b and 340 c) of each node (e.g. node 310 a, 310 b and 310 c). It shouldbe appreciated that cluster configuration can apply to any number ofnodes in a cluster, and is not limited to the three nodes illustrated inFIG. 3. In one embodiment, the nodes are coupled to a network.

In one embodiment, the present scalable web cluster design implements aworkload-aware request distribution (WARD) load balancing strategy inwhich a content-aware distribution is performed by each of the nodes ina web cluster. The current architecture is fully distributed. Each nodein a cluster performs three different functions:

-   -   the dispatcher, which implements the request distribution        strategy, it decides which web server will be processing a given        request;    -   the distributor, which interfaces with the client and implements        the TCP-handoff mechanism that distributes the client requests        to specific web server; and    -   the web server, which processes the requests.

In the present embodiment, the dispatcher component, which is replicatedacross all the cluster nodes, has the same routing information in allthe nodes. The routing information indicates which node of the clusteris for processing which requested file. This routing information isdefined by the off-line workload analysis process and a workload-awaredistribution strategy (WARD). The distributor component of each nodedistributes a request to a specific node in the cluster. The servercomponent of each node processes the request.

In the present embodiment, each node, after receiving a request, reviewsthe local dispatcher component routing table. The node then eitheraccepts the request for local processing by the server component orforwards the request to the server component of another node for remoteprocessing.

The present invention takes into account workload access patterns andcluster parameters such as number of nodes in a cluster, node RAM size,TCP handoff overhead, and disk access overhead. The present inventionutilizes more efficiently the overall cluster RAM leading to improvedweb server cluster performance. The distribution (routing) strategy WARDis defined by off-line analysis of the joint set of all web servercluster logs during a certain time interval (e.g., daily analysis). Theoff-line analysis logically splits all the files into the followingthree groups:

-   -   File_(score,)—a small set of most frequently accessed files,        called core, the requests to which are processed locally, by any        server in a cluster,    -   Files_(part,)—the requests to which are partitioned to be served        by different cluster nodes.    -   Files_(on disk)—the requests to which are processed locally, by        any server in a cluster.

FIG. 4 illustrates a block diagram of the partitioned memory unit 410 ofa web server cluster 400 implementing a workload-aware requestdistribution strategy in accordance with one embodiment of the presentinvention. In one embodiment, memory unit 410 is the cluster RAM (e.g.,the combined RAM of all nodes of the cluster). Memory unit 410 comprisesa number of partitioned memory units or “nodes” (e.g., nodes 430 a, 430b, and 430 c). It should be appreciated that memory unit 410 maycomprise any number of nodes. It should be further appreciated that eachnode can reside on an independent computer system, and is not limited tothe case of a partitioned memory unit.

In one embodiment, base files 440 are the set of files that fit into thecluster RAM (e.g., memory unit 410). In the present embodiment, basefiles 440 comprise web files for use in permitting a remote client 420to access the files over the Internet. Disk 470 is a remote locationdistinct from the cluster RAM, wherein files evicted from the clusterRAM are stored on disk 470.

Under the strategy presented in the present invention, the base files440 are represented by the three groups of files: Files_(core) andFiles_(part) in the ClusterRAM (e.g., memory unit 410), andFiles_(on disk) consisting of files evicted from RAM to disk (e.g., disk470) due to the expansion of the Files_(core). Each node comprises acore section 450 and a partitioned section 460 for storing Files_(core)and Files_(parts), respectively.

Web server performance greatly depends on efficient memory usage. Thethroughput of a web server is higher when it reads pages from a cache inmemory than from disk. If all files of the web site fit in memory theweb server demonstrates excellent performance because only the firstrequest for a file will require a disk access, and all the followingfile accesses will be served from memory. The present invention providesa method and system for achieving the goals of maximizing the number ofrequests served from the total cluster memory by partitioning files tobe served by different servers and minimizing the forwarding overhead byidentifying the subset of core files to be processed on any node, (e.g.,allowing the replication of these files in the memories across thenodes).

It is appreciated that processing the requests to the core files locallyby each cluster node helps to minimize the forwarding overhead. However,it may result in additional, initial disk accesses to core files on allthose nodes and extra disk accesses because more files will reside ondisk due to the expansion of the core files. This is why the ultimategoal here is to identify such a subset of core files for which theforwarding overhead savings are higher than the additional cost of thedisk accesses caused by the core files.

FIG. 5 is a flowchart diagram illustrating steps in a process 500 ofworkload-aware request distribution in cluster-based network servers inaccordance with one embodiment of the present invention.

At step 510 of process 500, a request for a file is received at a nodeof the web server cluster. In one embodiment, the request is an HTTPrequest sent by a remote client. Each node comprises a dispatchercomponent, a distributor component, and a server component (see scalableweb cluster configuration 300 of FIG. 3, infra).

At step 520, it is determined whether the requested file is a core file(e.g., a frequently accessed file assigned to be served by any node). Inone embodiment, the dispatcher component reviews the routing informationto determine whether the requested file is a core file.

If it is determined that the requested file is a core file, as shown atstep 530, the server component of the receiving node processes therequested file.

If it is determined that the requested file is not a core file, as shownat step 540, it is then determined whether the requested file is apartitioned file (e.g., a file assigned to be served by a particularnode in a cluster). In one embodiment, the dispatcher component reviewsthe routing information to determine whether the requested file is apartitioned file.

If the requested file is not a partitioned file, as shown at step 550,the requested file is served locally from the receiving node.

If the requested file is a partitioned file, as shown at step 560, it isdetermined whether the requested file is assigned to be processed by thereceiving node. If it is determined that the requested file assigned tobe processed by the receiving node, as shown at step 530, the requestedfile is served locally from the receiving node. In one embodiment, theserver component processes the requested file.

If it is determined that the requested file is not assigned to beprocessed by the receiving node, as shown at step 570, the distributorcomponent forwards the request to the remote node designated by thedispatcher component. In one embodiment, the request is processed at theremote node by the server component of the remote node.

At step 580, process 500 ends. Process 500 is repeated for every requestreceived by the cluster.

FIG. 6 is a flowchart diagram illustrating steps in a process 600 ofidentifying a set of core files and partitioned files that minimizes theoverhead due to the base files in accordance with one embodiment of thepresent invention.

At step 610 of process 600, a set of base files is defined. The basefiles are a set of frequently accessed files fitting into the clustermemory (RAM) of a web server cluster. In one embodiment, the clustermemory is RAM. In one embodiment, the base files are ordered bydecreasing frequency of access.

At step 620, the base files are logically partitioned into a set of corefiles having a core size, a set of partitioned files having apartitioned size, and a set of on disk files. In one embodiment, thebase files comprising each set of files are ordered by decreasingfrequency of access.

At step 630, the files comprising the core files and the partitionedfiles are identified, wherein the total of the partitioned size added tothe product of the number of nodes multiplied by the core size is lessthan or equal to the cluster memory.

In one embodiment, the frequencies of access (the number of times a filewas accessed) and sizes of individual files is used to determine thecore set of files. These are denoted by FileFreq and FileSize,respectively. These are gathered by analyzing web-server access logsfrom the cluster. Freq-Size is the table of all accessed files withtheir frequency and the files sizes. This table is sorted in decreasingfrequency order. The determination of the contents of the core filesassumes that the cache replacement policy of the file cache in theweb-server has the property that the most frequent files will mostlikely be in the ClusterRAM, wherein ClusterRAM is defined as the totalsize of all the file caches in the cluster.

If all the files were partitioned across the cluster nodes, the mostprobable files to be in the cluster RAM would be the most frequent filesthat fit into the cluster RAM. The set of files that fit into thecluster RAM is called BaseFiles (e.g., base files 440 of FIG. 4). Themaximum number of the BaseFiles are stored in the ClusterRAM (e.g.,memory unit 410 of FIG. 4 or cluster RAM), at a price that$\frac{N - 1}{N}$of the request coming to each node of the total N nodes have to behanded off. Under the present invention, BaseFiles are represented bythree groups of files as shown in Equation 1: Files_(core) andFiles_(part) in the ClusterRAM, and Files_(on disk) consisting ofBaseFiles that do not fit into ClusterRAM due to the expansion ofFiles_(core). They satisfy Equations 1 and 2:BaseFiles=Files_(part)+Files_(core)+Files_(on disk)   Equation 1Wherein:Files_(core) are the files belonging to the core, the requests to thesefiles are served locally by any node, and having a size Size_(core), thecombined size (in bytes) of the files in Files_(core);

-   -   Files_(part) are files belonging to the partition, the requests        to these files are served by a particular prescribed node (i.e.        they are forwarded using TCP handoff to be processed by a        particular node in a cluster), and having a size Size_(part),        the combined size (in bytes) of the files in Files_(part); and    -   Files_(on disk) are files belonging to neither the core nor the        partition, the requests to these files are served locally by any        node (e.g., these are the files which most likely to reside on        disk).        N∞Size_(core)+Size_(part)≦ClusterRAM   Equation 2

The ideal case for web server request processing is when a request isprocessed locally (e.g., it does not incur an additional forwardingoverhead (ForwardOH)) and it is processed from the node RAM (e.g., itdoes not incur an additional disk access overhead (DiskOH)). The goal isto identify a set of Files_(core) and a set of Files_(part) thatminimizes the total overhead due to BaseFiles:OH _(BaseFiles) =OH _(core) +OH _(part) +OH _(on disk).   Equation 3Wherein:

-   -   OH_(BaseFiles) is the total overhead due to BaseFiles;    -   OH_(core) is the overhead due to Files_(core);    -   OH_(part) is the overhead due to Files_(part); and    -   OH_(disk) is the overhead due to Files_(on disk).

Still with reference to FIG. 6, at step 640, the total overhead due tothe base files is minimized wherein the total overhead equals anoverhead of the core files plus an overhead of the partitioned filesplus an overhead of the on disk files.

First, analyze what the additional overhead incurred by processing therequests to Files_(part) is, denoted as OH_(part). Assuming all thesefiles are partitioned to be served by different nodes, statistically afile in the partition incurs forwarding overhead on the average$\frac{N - 1}{N}$times, where N is the number of nodes in the cluster. The file frompartition will also incur one disk access on the node it is assigned tothe first time it is read from disk. This reasoning gives the followingoverhead for the partition files: $\begin{matrix}{{Penalty}_{forward} = {\frac{N - 1}{N}\infty\quad{FileFreq}\quad\infty\quad{ForwardOH}}} & {{Equation}\quad 4}\end{matrix}$Penalty_(DiskAccess)=FileSize∞DiskOH   Equation 5 $\begin{matrix}{{OH}_{part} = {{\sum\limits_{{Files}_{part}}\quad{Penalty}_{forward}} + {Penalty}_{DiskAccess}}} & {{Equation}\quad 6}\end{matrix}$where ForwardOH is the processing time in μsec the TCP handoff operationconsumes, and DiskOH is the extra time in μsec it generally takes toread one byte from disk compared to from RAM.

Determine the additional overhead incurred by processing the requests toFiles_(core). If a file belongs to the core then the request to suchfile can be processed locally, (e.g., with no additional forwardingoverhead for these files). The drawback is that the files have to beread from disk into memory once on all the nodes in the cluster and thatthe number of files in Files_(on disk) increases due to the expansion ofFiles_(core), creating additional disk access overhead. However, this isunder the assumption that the files are accessed frequently enough thatat least one request for each file will end up on all nodes. For filesthat are accessed less frequently this number is expected to be lower,thus it is necessary to calculate the expected value of the number ofnodes that get at least one access to a file given a certain frequency fand a number of nodes N. $\begin{matrix}{{E(f)} = {\sum\limits_{i = 1}^{N}\quad{i \cdot {P\left( {f,i} \right)}}}} & {{Equation}\quad 7}\end{matrix}$

Here P(f,i) is the probability that exactly i nodes will have the fileafter f references to it. It can be calculated using the followingrecursion and starting conditions. $\begin{matrix}{{{P\left( {{f + 1},i} \right)} = {{{P\left( {f,{i - 1}} \right)} \cdot \frac{N - \left( {i - 1} \right)}{N}} + {{P\left( {f,i} \right)} \cdot \frac{i}{N}}}}{{P\left( {0,0} \right)} = 1}{{P\left( {0,1} \right)} = {{P\left( {0,2} \right)} = {\ldots = {{P\left( {0,N} \right)} = 0}}}}{{P\left( {1,0} \right)} = {{P\left( {2,0} \right)} = {\ldots = {{P\left( {\infty,0} \right)} = 0}}}}} & {{Equation}\quad 8}\end{matrix}$

The overhead due to extra disk accesses to core files, denoted asOH_(core), can then be calculated as follows. $\begin{matrix}{{OH}_{core} = {\sum\limits_{{Files}_{core}}\quad{{E\left( {{FileFreq},N} \right)}\infty\quad{DiskOH}\quad\infty\quad{FileSize}}}} & {{Equation}\quad 9}\end{matrix}$

Finally, the requests to Files_(on disk) will incur additional diskoverhead every time these files are accessed, which gives the followingequation. $\begin{matrix}{{OH}_{{on}\quad{disk}} = {\sum\limits_{{Files}_{{on}\quad{disk}}}\quad{{FileFreq}\quad\infty\quad{DiskOH}\quad\infty\quad{FileSize}}}} & {{Equation}\quad 10}\end{matrix}$

Using the reasoning and the equations above, a set Files_(core) thatminimizes the total overhead due to BaseFiles can be computed.

FIG. 7 is a flowchart diagram of an overall workload-aware requestdistribution strategy for use in a web server cluster in accordance withan embodiment of the present invention.

At step 710 of process 700, for a combined set of web server access logsin a cluster, a fileset profile is built for a combined set of webserver access logs. In one embodiment, the table of all accessed fileswith their file frequency (number of times a file was accessed duringthe observed period) and their file size is built. This table is sortedin decreasing file frequency order.

At step 720 a WARD mapping is built. Using process 600 of FIG. 6, theFiles_(core) and Files_(part) are computed. All files that do not belongto Files_(core) or Files_(part) are denoted as Files_(on disk)Files_(part) are further partitioned among the N nodes in the cluster insome balanced manner (e.g., according to a round-robin policy) such thatthe request to a file from Files_(part) is going to be processed by aparticular node in the cluster.

At step 730, once the WARD mapping is built, the dispatcher component ineach cluster node will enforce the following WARD routing strategy. Ifin core: serve locally If in partition and local: serve locally If inpartition and remote: send to designated remote node Everything else:serve locally

At step 740, the distributor component in the each cluster node willsend the request to be processed either locally or forward it to acorresponding node in the cluster, accordingly to directions of itscorresponding dispatcher component.

By monitoring the traffic to a web cluster and analyzing it (forexample, on a daily basis), WARD proposes a new balancing schema wherethe files (and requests to them) are classified into three groups:Files_(core), Files_(part) and Files_(on disk).

The preferred embodiment of the present invention, a method and systemfor workload-aware request in cluster-based network servers, is thusdescribed. While the present invention has been described in particularembodiments, it should be appreciated that the present invention shouldnot be construed as limited by such embodiments, but rather construedaccording to the below claims.

1-7. (canceled)
 8. A method for managing request distribution to a setof files stored on a server, said method comprising the steps of: a)receiving a request for a file at a first node of a plurality of nodes,each of said nodes comprising a distributor component for distributing arequest to a specific node of said plurality of nodes, a dispatchercomponent comprising routing information for said plurality of nodes andreplicated across said plurality of nodes, and a server component forprocessing said request; b) provided said request is for a core file,serving said core file from said first node irrespective of which of thenodes is the first node that received the request; c) provided saidrequest is for a partitioned file, determining whether said request isassigned to be processed by said first node; c1) provided said requestis for a partitioned file assigned to be processed by said first node,serving said partitioned file from said first node; and c2) providedsaid request is for a partitioned file assigned to be processed byanother node of said plurality of nodes, forwarding said request to aspecific node of said plurality of nodes as indicated by said dispatchercomponent of said first node and serving said partitioned file from saidspecific node.
 9. The method of claim 8 wherein said plurality of nodesform a web server cluster.
 10. The method of claim 9 further comprisinga set of base files, wherein said base files are a set of frequentlyaccessed files fitting into a cluster memory of said web server cluster.11. The method of claim 10 wherein said set of base files comprises aset of core files comprising said core file, a set of partitioned filescomprising said partitioned file, and a set of on disk files.
 12. Themethod of claim 8 wherein each of said plurality of nodes furthercomprises a set of core files comprising said core file and a set ofpartitioned files comprising said partitioned file.
 13. The method ofclaim 12 wherein said set of core files comprises a set of mostfrequently accessed files of said set of base files. 14-20. (canceled)21. The method of claim 8 further comprising: d) provided said requestis not for a said core file or a said partitioned file, serving therequested file from said first node.
 22. A method for managing requestdistribution to a set of files stored on a server, said methodcomprising the steps of: a) storing a set of core files to each of aplurality of nodes; b) assigning processing of each of a set ofpartitioned files to a respective one of said plurality of nodes; c)receiving a request for a file at a first node of said plurality ofnodes; d) provided said request is for one of said core files, servingthe requested core file from said first node irrespective of which ofthe nodes is the first node that received the request; e) provided saidrequest is for a partitioned file, determining whether said request isassigned to be processed by said first node; f1) provided said requestis for a partitioned file assigned to be processed by said first node,serving said partitioned file from said first node; and f2) providedsaid request is for a partitioned file assigned to be processed byanother node of said plurality of nodes, forwarding said request to saidanother node and serving said partitioned file from said another node.23. The method of claim 22 further comprising: g) provided said requestis not for a said core file or a said partitioned file, serving therequested file from said first node.
 24. The method of claim 22 whereinsaid plurality of nodes form a web server cluster.
 25. The method ofclaim 22 further comprising a set of base files, wherein said base filesare a set of frequently accessed files fitting into memory of saidplurality of nodes.
 26. The method of claim 25 wherein said set of basefiles comprises said set of core files, said set of partitioned files,and a set of on disk files.
 27. The method of claim 25 wherein said setof core files comprises a set of most frequently accessed files of saidset of base files.